Golang Job: Site Reliability Engineer

Job added on

Company

OutSystems
Portugal

Location

Remote Position
(From Everywhere/No Office Location)

Job type

Full-Time

Golang Job Details

About OutSystems
OutSystems is the pioneer of the fast-growing high-performance low-code development market. The company empowers organizations to innovate their businesses through software, and we are looking for talented and motivated individuals to join our team.

The OutSystems high-performance low-code platform is considered one of the leading application development solutions by customers, analysts such as Gartner and Forrester, and partners around the world. The platform gives organizations – from small and mid-sized businesses to enterprises – the tools to build, deliver, manage and evolve mobile and web applications that are unique to their businesses. With high-productivity, AI-assisted tools, customers can tackle any strategic challenge – from modernizing and integrating their existing applications, to automating business and workplace processes, and creating exceptional customer apps and experiences. Solutions built with OutSystems are secure, resilient, cloud-native, built to scale, and can be continuously evolved.

OutSystems is a truly global organization, with customers across 87 countries and 22 industries, more than 400 partners, and a 600,000-developer community. The company is known for creating a culture of innovation inside its customer organizations, with many working with OutSystems for more than a decade.

Founded in 2001 in Portugal, we have offices in Lisbon, the United States, United Kingdom, the Netherlands, Germany, France, the UAE, India, Japan, Hong Kong, Malaysia, Singapore, and Australia, and of course a thriving and collaborative worldwide community of remote employees. OutSystems continuously receives leadership recognition from the top analyst firms, the CODiE Awards, SD Times 100, the Cloud100, and other influential groups.

Our team members are at the core of a dynamic, industry-leading company that is helping organizations of all sizes across the globe innovate their businesses through the power of software.
About This Role
Resiliency doesn’t happen by accident. That’s particularly true for large-scale, massively distributed systems that run in the Cloud. It needs to be deliberately engineered into systems, and considered throughout the entire development lifecycle, from early design to operations.
“Site Reliability Engineering is what happens when you ask a software engineer to design an operations team.”

At OutSystems, our Site Reliability Engineers (SREs), combine advanced Software Engineering practices with mature Operations skills in order to deliver and operate highly resilient systems at scale. SREs, ensure that our Cloud services meet the reliability and uptime requirements of our demanding enterprise customers. This is achieved with proactivity, through the practice of sound engineering practices and resilient design from day 0; as well as with reactively, through a well-defined and effective on-call rotation that runs 24x7.
SREs engineer our production systems to be run at scale, so that manual and repetitive work is fully eliminated. They follow blameless postmortems practices so that all incidents are well understood and problems are fixed at their root. Over time, they make our systems more robust, fault-tolerant and able to self-heal during the worst of outages and through the most unexpected circumstances.
SREs are experts in troubleshooting complex problems and can dig very deep into why systems break in production. In order to do that, they rely on observability practices like centralized logging, distributed tracing and anomaly detection. They shorten detection (MTTD) and recovery times (MTTR), by improving the accuracy of alarms and speed of troubleshooting.
SREs leverage the latest infrastructure automation best practices and the toolset offered by Cloud Providers, so that they multiply their effectiveness and reach bigger outcomes.

Key Responsibilities and Skills:
  • Automate highly scalable and resilient cloud operations that can be executed with no customer downtime;
  • Perform blameless root cause analysis on outages and ensure action items are done;
  • Fix resiliency problems wherever they are in the product, or collaborate with product teams to do it;
  • Monitor customer infrastructure, measuring availability and system health;
  • Collaborate with customer support in recovering from escalated outages;
  • Troubleshoot complex incidents in highly distributed systems;
  • Shorten time to detecting by improving the accuracy of alarms;
  • Be a key stakeholder in the design of cloud services so that they are resilient from day 0.
Minimum Qualifications and Skills:
  • Bachelor or Master Degree in Computer Science or similar.
  • 5+ years of experience in software development or operations.
  • Programming skills in a high-level language (Python, Golang, etc.).
  • Experience with automation and IaC (Terraform, CloudFormation, Ansible, etc.);
  • Experience in troubleshooting and debugging;
  • Availability to work in shifts and be part of the 24x7 on-call rotation;
  • Fluency in English and good communication skills.
Preferred Qualifications and Skills:
  • Experience with Cloud providers (AWS, Azure and GCP).
  • Experience with Docker and Kubernetes.
  • Experience with Ingress Controllers.
  • Experience with monitoring and troubleshooting complex distributed systems;
  • Experience in designing resilient and fault-tolerant systems;
  • Experience in debugging complex, distributed systems.
  • Understanding of OAuth 2.0 and OIDC.
  • Experience with AWS CloudFront and/or other CDNs would be ideal.
Location: Portugal, remote
What do we have to offer you?
  • A company that continues to grow, change and innovate, and gives our teams the space to be proactive and creative.
  • Real career opportunities. We care about growth and development. Vertical career progression is an obvious possibility, but we also offer the possibility for lateral moves, joining different teams, and mastering specific skills.
  • Work colleagues that are as smart, hardworking and driven as you – and a team that is global.
  • A company culture that is based on transparency, teamwork and excellence (as promised in our Small Book of the Few Big Rules and delivered every day.)
  • Disrupting the status quo is in our DNA. In fact, it’s why our company exists.
  • We “Ask Why” a lot. It helps us connect our individual work to the bigger picture and sometimes even uncover a better way.
Are you ready for the next step in your career? Then we’d love to hear from you!

OutSystems nurtures an inclusive culture of diversity, where everyone feels empowered to be their authentic self and perform at their best. A company that embraces the creativity and innovation that comes through diverse perspectives. We are committed to creating a team that reflects society through inclusive programs and initiatives and are proud to be an equal opportunity employer. All qualified applicants receive equal consideration regardless of race, place of origin, color, age, marital status, religion, sex, sexual orientation, gender expression or identity, protected veteran status, disability status or any other status protected by law.


#LI-DS1
#LI-Remote